day23 RAG與雲端非向量搜尋服務：AWS Kendra與GCP AgentBuilder初探

16th鐵人賽 rag information retrieval aws gcp

jay0810

團隊不時以註解遮羞的實習同學

2024-09-22 11:29:36

246 瀏覽

分享至

前言

昨天我們介紹了AWS Bedrock Knowledge Bases和Azrue AI Search，這些都有提供將文件轉乘向量的功能，但是其實也有不轉成向量，單純用非結構文件的方式進行關鍵字搜尋的服務，也能和LangChain串接。

正文

AWS Kendra

https://aws.amazon.com/tw/kendra/

透過連接器，可以是AWS的或是其他第三方的資料源整合進Kendra內

連接器的種類https://aws.amazon.com/tw/kendra/connectors/

在LangChain的串接上，只需要提供ID即可
https://python.langchain.com/v0.2/docs/integrations/retrievers/amazon_kendra_retriever/

rom langchain_community.retrievers import AmazonKendraRetriever
retriever = AmazonKendraRetriever(index_id="")

AWS官方的Kendra結合LangChain應用

https://aws.amazon.com/cn/blogs/china/quickly-build-high-accuracy-generative-ai-applications-on-enterprise-data-using-amazon-kendra-langchain-and-large-language-models/
這個文章有大量說明和結構圖如下

Vertex AI Agent Builder (Vertex AI Search )

其實我在查看文件時，很容易看見這兩個名子，目前推測這兩個名子可能是同個服務?
https://cloud.google.com/products/agent-builder?hl=zh-cn

在使用上我們會針對資料儲存庫和應用程式去進行使用

資料來源的種類

langChain串接上，主要說明SEARCH_ENGINE_ID、DATA_STORE_ID

from langchain_google_community import (
    VertexAIMultiTurnSearchRetriever,
    VertexAISearchRetriever,
)

PROJECT_ID = "<YOUR PROJECT ID>"  # Set to your Project ID
LOCATION_ID = "<YOUR LOCATION>"  # Set to your data store location
SEARCH_ENGINE_ID = "<YOUR SEARCH APP ID>"  # Set to your search app ID
DATA_STORE_ID = "<YOUR DATA STORE ID>"  # Set to your data store ID

retriever = VertexAISearchRetriever(
    project_id=PROJECT_ID,
    location_id=LOCATION_ID,
    data_store_id=DATA_STORE_ID,
    max_documents=3,
)